folder<-"~/Documents/Work/Stitzel_lab/Islet_sets/Rmarkdown/"
suppressMessages(
source(paste(folder,"functions.R",sep=""))
)
This document describes the computational pipeline of the Motakis et al. Estimation of pathophysiological differences of pancreatic endocrine and exocrine islet cell types in pre-diabetic and type 2 diabetic cells, submitted to …,July 2022.
Pancreatic islet (dys)function is central to glucose homeostasis and type 2 diabetes pathophysiology. They consist of multiple cell types, including discrete endocrine cell types producing distinct hormones that together modulate glucose mobilization or disposal. Single cell transcriptome profiling (scRNA-seq) studies have dissected human islet cellular heterogeneity to define the molecular repertoire of each islet cell (sub)population. However, precise understanding of cell type-specific differences in healthy vs. disease states is lacking, due in part to the limited number of individuals or cells profiled for comparison. Here, we report a single cell transcriptome atlas of 245,878 islet cells obtained from 48 individuals, 17 non-diabetic (ND), 14 pre-diabetic (PD), and 17 type 2 diabetic (T2D) of matched sex, age, and ancestry. We identify marker genes that are robust across disease states for each of 14 cell types detected and observe a significant decrease in the number of beta cells sampled from T2D. Comparison of aggregated (pseudobulk) beta cell scRNA-seq profiles revealed 511 differentially expressed genes in T2D vs ND donors, including monogenic diabetes (e.g. HNF1A) and type 2 diabetes effectors (e.g. DGKB, ASCL2) genes. siRNA knockdown of XX/YY newly identified down-regulated genes that impair Beta cell viability or function. Finally, we describe 8 putative beta cell subpopulations, two of which significantly increase or decrease in T2D vs PD or ND donor islets, respectively. This study provides new and robust, cell type-resolved insights on the cellular and molecular changes in healthy vs diabetic human islets and represents a substantial resource to the islet biology and type 2 diabetes communities.
The pipeline and methodology have been extensively discussed in the submitted manuscript. Here, we will show in detail the computational steps (R script, tables and plots) that generated the main findings and other supporting material. The reader can use our code to replicate the results and to explore other aspects of our data.
The description of the data used in this manuscript can also be found at https://github.com/ArisStefanosSn/Islets_Study. The folder and functions below contain all necessary inforation to replicate the results of this study.folder<-"~/Documents/Work/Stitzel_lab/Islet_sets/Rmarkdown/"
suppressMessages(
source(paste(folder,"functions.R",sep=""))
)
Pancreatic islets were cultured using CMRL, supplemented with 10% FBS, 1% Glutamax,1% Pen/Strep for 14 days. Islet-derived fibroblasts were harvested and gDNA extracted using the Blood & Tissue kit (Qiagen). The RNAse A (Qiagen) treated genomic DNA samples were genotyped using the Infinium Global Diversity Array-8 v1.0 Kit (Illumina).
Single cell capture, barcoding and library preparation were performed using the 10X Chromium platform (https://www.10xgenomics.com) according to the manufacturer’s protocol for chemistries v2 (#CG00052) and v3 (#CG000183). Illumina base call files for all libraries were converted to FASTQs using CellRanger-6.1.2 demultiplexing and count pipelines (https://www.10xgenomics.com). Initially, we used cellranger’s mkfastq to demultiplex the raw base call (BCL) files generated by Illumina sequencers, perform adapter trimming and retrieve the 10-bp length UMI bases to be included into the generated FASTQ files for downstream processing. We loaded the FASTQs to STARsolo with STAR 2.7.9a and, using the v2 / v3 whitelists associated with cellranger v.6 installation, we aligned the reads onto the Ensembl human genome GRCh38 (https://uswest.ensembl.org/Homo_sapiens/Info/Index) for each of the Gel bead-in Emulsions (GEMs) of each library. We filtered out the empty droplets with STARsolo’s EmptyDrops_CR option keeping a median of 7748 cells across libraries for further analysis (25% - 75% IQR: 5891 - 9133). In total 414,082 cell-containing droplets were estimated. Our original target was to sequence 6,000 cells per library. The excess of droplet-containing cells estimated by STARsolo can be probably accounted as false positives whose amount, as the IQRs imply, was library dependent.We generated the single-cell RNA-seq data of 17 non-diabetic (ND), 14 prediabetic (PD) and 17 type 2 diabetic (T2D) human cadavers of matched sex, ethnic and age groups. The separation of ND from PD was done in terms of their measured HbA1c levels (\(HbA1c < 5.9\) for ND and \(5.9 \leq HbA1c \leq 6.4\) for PD). For T2D we considered the specific diagnosis on the donor chart/patient history. Several T2D donors were on medication and exhibited controlled levels of HbA1c (14 out of the 17 donors had \(HbA1c \geq 6.4\)). Cells from 12 donors had their RNA sequenced in multiple donor-specific or genetically multiplexed libraries. Specifically, each of the following islet pairs with IDs Islet70 (PD) / Islet71 (T2D), Islet84 (T2D) / Islet85 (PD) and Islet118 (PD) / Islet119 (T2D) were multiplexed twice (in two different libraries). On the other hand, the data of IDs Islet47 (ND) / Islet48 (T2D), Islet57 (ND) / Islet58 (PD) and Islet59 (ND) / Islet60 (T2D) were sequenced separately (a standalone library per ID) and in multiplex (a pair-specific library). As such, the data of our 48 donors spanned across 54 libraries.
We take a closer look at the islet characteristics by tabulating / plotting the clinical and demographic information of this study. We start by tabulating the categorical variables across conditions, i.e. sex, race, chemistry, center, medication (T2D only) and Cause Of Death against ND, PD and T2D:########################
# load the stored data #
########################
info<-read.table(paste(folder,"islet_info.txt",sep=""),sep="\t",header=T)
head(info)
## Islet Sex BMI Race HbA1c Age Viability Purity Chemistry Center
## 1 Islet123 F 28.3 H 4.6 60 95 90 V3 Scharp-Lacy
## 2 Islet42 M 27.2 W 4.7 66 95 95 V3 Scharp-Lacy
## 3 Islet47 F 22.7 W 4.8 53 95 90 V3 Scharp-Lacy
## 4 Islet29 M 35.6 W 5.1 54 95 80 V2 Wisconsin
## 5 Islet45 F 24.6 W 5.1 50 91 90 V3 Miami
## 6 Islet68 M 29.0 W 5.1 52 98 95 V3 Wisconsin
## Medication COD Condition
## 1 <NA> Head trauma ND
## 2 <NA> Cerebrovascular/stroke ND
## 3 <NA> Cerebrovascular/stroke ND
## 4 <NA> Cerebrovascular/stroke ND
## 5 <NA> Cerebrovascular/stroke ND
## 6 <NA> Head trauma ND
###################################################################################
# sex frequencies and fisher test for independence of rows and columns (matching) #
###################################################################################
table(info$Condition,info$Sex)
##
## F M
## ND 5 12
## PD 4 10
## T2D 7 10
fisher.test(table(info$Condition,info$Sex))
##
## Fisher's Exact Test for Count Data
##
## data: table(info$Condition, info$Sex)
## p-value = 0.796
## alternative hypothesis: two.sided
####################################################################################
# race frequencies and fisher test for independence of rows and columns (matching) #
####################################################################################
table(info$Condition,info$Race)
##
## AA H W
## ND 2 5 10
## PD 2 6 6
## T2D 4 6 7
fisher.test(table(info$Condition,info$Race))
##
## Fisher's Exact Test for Count Data
##
## data: table(info$Condition, info$Race)
## p-value = 0.8114
## alternative hypothesis: two.sided
#########################
# chemistry frequencies #
#########################
table(info$Condition,info$Chemistry)
##
## V2 V3
## ND 4 13
## PD 1 13
## T2D 7 10
######################
# center frequencies #
######################
table(info$Condition,info$Center)
##
## Miami Scharp-Lacy Southern California IsletCell Resource Center UPenn
## ND 2 9 1 0
## PD 2 6 3 1
## T2D 0 10 3 3
##
## Wisconsin
## ND 4
## PD 2
## T2D 1
###################
# COD frequencies #
###################
table(info$Condition,info$COD)
##
## Anoxia cerebrovascular/stroke Cerebrovascular/stroke Head trauma
## ND 0 0 9 7
## PD 8 1 3 1
## T2D 2 1 10 1
##
## head trauma - gunshot wound/suicide
## ND 1
## PD 0
## T2D 0
########################################################
# medication frequencies (T2D only, NR = Non-Reported) #
########################################################
table(info$Condition,info$Medication)
##
## Metformin Metformin (6yrs) Metformin(2yrs) Metformin(3yrs)
## ND 0 0 0 0
## PD 0 0 0 0
## T2D 1 1 1 1
##
## Metformin(5yrs) Metformin(6yrs) NR oral meds
## ND 0 0 0 0
## PD 0 0 0 0
## T2D 1 1 8 3
#####################
# make the age plot #
#####################
fig<-plot_ly(info,x=~Condition,y=~Age,type="box",boxpoints = "all", jitter = 0.3,color=~Condition)
fig
Figure 1.1: Distribution of Age in ND, PD and T2D conditions.
#######################
# Tukey's HSD for Age #
#######################
mod<-aov(Age~Condition,data=info)
TukeyHSD(mod, conf.level=.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Age ~ Condition, data = info)
##
## $Condition
## diff lwr upr p adj
## PD-ND 4.0882353 -5.558538 13.73501 0.5638493
## T2D-ND 4.9411765 -4.226942 14.10930 0.3991783
## T2D-PD 0.8529412 -8.793832 10.49971 0.9750172
#####################
# make the BMI plot #
#####################
fig<-plot_ly(info,x=~Condition,y=~BMI,type="box",boxpoints = "all", jitter = 0.3,color=~Condition)
fig
Figure 1.2: Distribution of BMI in ND, PD and T2D conditions.
#######################
# Tukey's HSD for BMI #
#######################
mod<-aov(BMI~Condition,data=info)
TukeyHSD(mod, conf.level=.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BMI ~ Condition, data = info)
##
## $Condition
## diff lwr upr p adj
## PD-ND 2.277311 -2.2960944 6.850716 0.4553817
## T2D-ND 4.635294 0.2888123 8.981776 0.0342524
## T2D-PD 2.357983 -2.2154222 6.931389 0.4307976
#######################
# make the HbA1c plot #
#######################
fig<-plot_ly(info,x=~Condition,y=~HbA1c,type="box",boxpoints = "all", jitter = 0.3,color=~Condition)
fig
Figure 1.3: Distribution of HbA1c in ND, PD and T2D conditions.
###############################
# t-test of HbA1c in ND vs PD #
###############################
nd_vs_pd<-t.test(info$HbA1c[info$Condition=="ND"],info$HbA1c[info$Condition=="PD"])
nd_vs_pd
##
## Welch Two Sample t-test
##
## data: info$HbA1c[info$Condition == "ND"] and info$HbA1c[info$Condition == "PD"]
## t = -7.8282, df = 27.702, p-value = 1.705e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8594002 -0.5027847
## sample estimates:
## mean of x mean of y
## 5.211765 5.892857
################################
# t-test of HbA1c in ND vs T2D #
################################
nd_vs_t2d<-t.test(info$HbA1c[info$Condition=="ND"],info$HbA1c[info$Condition=="T2D"])
nd_vs_t2d
##
## Welch Two Sample t-test
##
## data: info$HbA1c[info$Condition == "ND"] and info$HbA1c[info$Condition == "T2D"]
## t = -5.525, df = 16.914, p-value = 3.778e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.235543 -1.446810
## sample estimates:
## mean of x mean of y
## 5.211765 7.552941
################################
# t-test of HbA1c in PD vs T2D #
################################
pd_vs_t2d<-t.test(info$HbA1c[info$Condition=="PD"],info$HbA1c[info$Condition=="T2D"])
pd_vs_t2d
##
## Welch Two Sample t-test
##
## data: info$HbA1c[info$Condition == "PD"] and info$HbA1c[info$Condition == "T2D"]
## t = -3.9443, df = 16.472, p-value = 0.001104
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.5502526 -0.7699154
## sample estimates:
## mean of x mean of y
## 5.892857 7.552941
################################
# Bonferroni adjusted p-values #
################################
p<-c(nd_vs_pd$p.value,nd_vs_t2d$p.value,pd_vs_t2d$p.value)
p<-p.adjust(p,"bonferroni")
names(p)<-c("NDvsPD","NDvsT2D","PDvsT2D")
p
## NDvsPD NDvsT2D PDvsT2D
## 5.115146e-08 1.133520e-04 3.311898e-03
###########################
# make the Viability plot #
###########################
fig<-plot_ly(info,x=~Condition,y=~Viability,type="box",boxpoints = "all", jitter = 0.3,color=~Condition)
fig
## Warning: Ignoring 1 observations
Figure 1.4: Distribution of cell viability in ND, PD and T2D conditions.
#############################
# Tukey's HSD for Viability #
#############################
mod<-aov(Viability~Condition,data=info)
TukeyHSD(mod, conf.level=.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Viability ~ Condition, data = info)
##
## $Condition
## diff lwr upr p adj
## PD-ND 0.08403361 -2.164847 2.332915 0.9954820
## T2D-ND -0.93382353 -3.104258 1.236611 0.5537691
## T2D-PD -1.01785714 -3.298256 1.262541 0.5297632
########################
# make the Purity plot #
########################
fig<-plot_ly(info,x=~Condition,y=~Purity,type="box",boxpoints = "all", jitter = 0.3,color=~Condition)
fig
## Warning: Ignoring 1 observations
Figure 1.5: Distribution of cell purity in ND, PD and T2D conditions.
##########################
# Tukey's HSD for Purity #
##########################
mod<-aov(Purity~Condition,data=info)
TukeyHSD(mod, conf.level=.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Purity ~ Condition, data = info)
##
## $Condition
## diff lwr upr p adj
## PD-ND -3.235294 -7.257157 0.7865687 0.1366065
## T2D-ND -4.110294 -7.991864 -0.2287243 0.0358008
## T2D-PD -0.875000 -4.953228 3.2032280 0.8618149
We employed the multi-step quality control pipeline, described in the following paragraphs, to clean and analyze our heterogeneous dataset. We performed read alignment and gene quantification with STARsolo giving us a starting set of 414,082 cell-containing droplets. Working with each library separately, we decontaminated the ambient RNA with SoupX and, where needed, we deconvoluted the donor information with Demuxlet using estimated donor-specific SNPs from the genotype information. We estimated homotypic (different cell types) doublets by Scrublet and DoubletFinder and used a multi-criteria quality control approach considering features, UMIs, percentage of reads mapped to the mitochondrial genome and markers analysis to filter out low-quality cells.
The high-quality cells were merged, integrated with Harmony, cleaned further for doublets and annotated using well known and estimated markers from differential expression analysis. Focusing on the donor variability, we converted the heterogeneous single-cell RNA-seq into pseudobulk raw counts that were fed into a differential expression analysis model to compare, for each cell type, the three disease states, ND vs PD s T2D, across donors (replicates). Further insights into the differences across the three disease states were gained via cell type subclustering and subsequent differential expression at the single-cell level.